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Abstract 

In the field of population genetics measures of genetic differentiation are 
widely used to gather information on the structure and the amount of gene 
flow between populations. These indirect measures are based on a number 
of simplifying assumptions, for instance equal population size and symmetric 

q migration. Structured populations with asymmetric migration patterns, fre- 

quently occur in nature and information about directional gene flow would 
here be of great interest. Nevertheless current measures of genetic differenti- 

. £h ation cannot be used in such systems without violating the assumptions. To 

get information on asymmetric migration patterns from genetic data rather 
complex models using maximum likelihood or Bayesian approaches generally 
need to be applied. In such models a large number of parameters are es- 
timated simultaneously and this involves complex optimization algorithms. 
We here introduce a new approach that intends to fill the gap between the 
complex approaches and the symmetric measures of genetic differentiation. 
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Our approach makes it possible to calculate a directional component of ge- 
netic differentiation at low computational effort using any of the classical 
measures of genetic differentiation. The approach is based on defining a pool 
of migrants for any pair of populations and calculating measures for genetic 
differentiation between the populations and the respective pools. The di- 
rectional measures of genetic differentiation can further be used to calculate 
asymmetric migration. The procedure is demonstrated with a simulated data 
set with known migration pattern. A comparison of the estimation results 
with the migration pattern used for simulation suggests, that our method 
captures relevant properties of migration patterns even at low migration fre- 
quencies and with few marker loci. 

Keywords: Genetic differentiation, Gene flow, Asymmetric Migration, 
Microsatellite data 



1. Introduction 



Measures of genetic differentiation are widely used standard tools in pop- 
ulation genetics. By measuring the degree of differentiation, information 
on how populations are structured can be extracted from allelel frequencies. 



The most frequently used measures are Wright's fixation index F st (|Wright 
Nei's Gst pel 



1943) 



1973) 



pendent of gene diversity (Jost, 2008) 



and the recently introduced D, which is inde- 
In combination with Wrights island 



model, measures of genetic differentiation can be used to calculate migration 
between populations (Jost, 2008, Wright, 1931, 1949). The island model is 



based on a number of simplifying assumptions such as equal population size, 
drift /migration equilibrium, symmetric migration and no selection or muta- 



tion (Whitlock and McCauley, 1999, Wright, 1931). Measures of migration 



calculated from genetic differentiation share these assumptions, all of which 



are likely to be violated in natural populations (Whitlock and McCauley 



1999). 



Structured populations with asymmetric migration between denies are 
common in nature, especially in systems driven by physical transport pro- 



cesses such as wind or water currents (Pringle et al. , 2011). Both physical 



processes and density dependent competition can lead to source and sink 
dynamics in a population and in such populations parameters as genetic 
differentiation can be significantly skewed (Dias, 1996). In marine environ- 



ments, ocean currents are known to not only affect the dispersal of planktonic 



species, many marine benthic species are also influenced by oceanic circu- 
lation since early life-stages (eggs, spores and larvae) often are planktonic 
(Cowen and Sponaugle 2009 Siegel et al. , |2003 ). Other examples where 



gene flow can be affected by physical processes are river systems and wind 



pollinated plants, moss and lichen (Friedman and Barrett, 2009, Hanfling 



and Weetman, 2006, Munoz et al. 2004). In asymmetric systems, it can 



be essential to estimate directional gene flow to understand the population 
genetic structure (e.g. Pringle et al. 2011). For instance in conservation, 



protecting a source population of a threatend species may be more efficient 
than protecting a sink population. Further all measures of genetic differenti- 
ation are symmetric with respect to the populations considered and they do 
not provide directional information. 

To get information on migration in asymmetric systems rather complex 
models using Maximum likelihood of Bayesian approaches generally need to 
be applied (e.g. Beerli and Felsenstein 2001, Wilson and Rannala 2003). 



In such models a large number of parameters are estimated simultaneously 
involving complex optimization algorithms. These models are often used 
as so called black boxes, implying that users due to the complexity of the 
approach typically only have a limited knowledge of the underlying model 
and assumptions. 

As a contrast to advanced and computationally expensive methods, we 
present a rather simple alternative that makes it possible to calculate a direc- 
tional component of genetic differentiation at low computational effort using 
any of the classical measures of genetic differentiation. Our approach gives 
information on direction of gene flow and intends to fill the gap between 
the complex approaches and the symmetric measures of genetic differentia- 
tion. The approach is based on defining a pool of migrants for each pair of 
populations and calculating measures for genetic differentiation between the 
populations and the pool. The directional measures of genetic differentiation 
can further be used to calculate asymmetric migration. This new approach 
opens up for a new field of applications where directional measures of ge- 
netic differentiation can be useful both to explore and to detect asymmetric 
migration patterns. 



2. Measures of genetic differentiation 

As a general feature, measures of genetic differentiation are zero if the 
allele frequencies of the populations are equal (total similarity), values larger 



than zero indicate differences in allele frequencies and a value of one indi- 



cate no shared alleles (total differentiation) (Meirmans and Hedrick, 2011) 



Generally they are based on two values, the mean heterozygosity in the total 
population (H t ) and the mean heterozygosity within the individual popula- 
tions (H s ) (Meirmans and Hedrick, 2011). For ease of discussion we here 



introduce vector notations of these parameters and of the measures G st and 
D. 

Let the total number of different alleles present in P populations be N. 
The allele frequencies in the individual populations can then be arranged in 
the N x P-matrix A. 



A 



a ±1 



a jvi 



a 1P 



\ 



ai, 



.op) 



(1) 



fljvp J 



where the matrix element a^ represents the frequency of allele i in population 



N 



j with J^i a ii = 1 f° r an y population j. The column vectors a,j G [0, 1] 
constitute the allele frequencies in the individual populations]^] For each 
population, the degree of heterozygosity (H) can be calculated from the 
vector of allele frequencies a e [0,1]^ as: 



Hid] 



1 — a T a 



-H 2 • (2) 

For a pairwise comparison of two populations with allele frequencies a G 
[0, 1]^ and b G [0, 1]^, within-population heterozygosity (H s ) and total- 
population heterozygosity (H t ) can be calculated from equation (]2j as: 



H t (a,b) 

H 8 (a,b) 



-(U 



m 



(3a) 
(3b) 



From these expressions the relevant measures for genetic differentiation 
between populations with allele frequencies a and b can be defined using 
vector algebra: 



1 Please note, that bold letters indicate vectors. 



D st (a,b) = H t (a,b)-H s (a,b) = ^\a-b\ 2 , (4a) 

D st (a,b) \a-b\ 2 

Gs ^ b) = ^Kby = 4-|a + b| 2 ' (4b) 

n , .v 2JJ rt (o,b) |a-b| 2 ,. . 

D(a ' b) = l-H s (aM = W^W ■ (C) 

Expressions Q are equivalent to the (biased) estimators used for calcula- 



tion of genetic differences as compiled e.g. in (Jost, 2008). A multilocus value 



of D can be obtained by taking the harmonic mean across loci (Crawford 



2010). 



3. A new concept for estimating directional measures of genetic 
differentiation 

Let us start with a thought experiment. Imagine two populations A and 
B with a strong gene flow from A to B but without any gene flow in the 
opposite direction, e.g. as observed in systems driven by ocean currents. In 
addition, population B may be in contact with other populations which are 
genetically different from A and which contain alleles not present in A. How 
would such a situation be reflected in the allelic frequencies? Generally we 
can expect that most alleles present in A are represented in population B, 
whereas alleles present in B may or may not be present in A due to lack 
of gene flow from B to A. Hence, in case of neutral mutations, the relative 
allele frequencies of A should be more or less reflected in B. An idealized 
example of a allelic matrix is listed in Table [TJ In this example allele 3 is 
present in population B only. Alleles 1 and 2 present in population A are 
represented at reduced frequencies but equal proportions in population B. 
From the distribution of allele frequencies it becomes evident, that there is 
no gene flow from B to A, whereas there might be gene flow from A to B. 

How can this concept be formalized? For each pair of populations to be 
investigated (populations A and B in this example), we introduce a hypo- 
thetical pool of migrants with an allelic composition inferred from the two 
populations investigated. That is, we introduce a hypothetical population 
with an allelic distribution f(a,b). From our thought experiment we can 
define a number of requirements for f(a, b): 





Population A 


Population B 


Allele 1 


0.4 


0.2 


Allele 2 


0.6 


0.3 


Allele 3 


0.0 


0.5 



Table 1: Allelic matrix A of the thought experiment consisting of two populations A and 
B with directional gene flow from A to B. 



1. f(a,b) should be symmetric in its arguments, since the order of pop- 
ulations is arbitrary. 

2. Alleles not represented in one of the populations can be assumed not 
to be relevant for migration. As a consequence / should be non-zero 
only for alleles present at non-zero frequencies in both populations. 



A general form for f(a, b) consistent with these two conditions would be 



K 



Pk}OL k 



Vi 



(5) 
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with arbitrary exponents otk, Pk > for all k and an arbitrary K defining 
the number of summands involved. In the optimal example data discussed 
in connection with the thought experiment (Table 111) we have ai/a 2 = hyjb^i 
which mirrors the fact that alleles 1 and 2 in population B originate from a 
strong gene flow from A. In order to be consistent with the thought experi- 
ment, we require the pool of migrants to reproduce this proportion of alleles 
as well: 

3. In case of strong gene flow in only one direction and genetically totally 
differentiated source populations we require fi(a, b)/fj(a, b) = a>i/aj = 
bi/bj for all combination of alleles i and j present in both populations. 

4. As a vector of allele frequencies / needs to be normalized, i.e. it needs 
to fulfill Zifi = 1- 



From the 3rd requirement on / we can demand a^ + /?/. 
an equation of the form 



1 for any k, making 



A' 



fi(a,b) =7^ 7fc W* 6 -- 
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with arbitrary < a& < 1 for all k < K the most general functional ansatz 
for the allelic frequencies in the pooled populations. The constants 7 and 
7fc are related to the previously defined constants 7^ but just as well can be 
considered as new constants here. For the time being and the rest of this 
paper we will stick to the simplest available function in this class, namely 
«i = 71 = 1/2 and 7^ = Vz 7^ 1, resulting in 

fi(a,b) = 7^«A Vz (7) 

with 7 = (y\ a/^A) • Thus, f(a, b) are the normalized geometrical means 
of the respective components of a and b. 

As a measure for directional differentiation, we can now compare both A 
and B respectively to their hypothetical pool of migrants f(a,b). For this 
purpose any of the standard, non-directional measures for genetic differentia- 
tion introduced in Section [2] can be applied. With the idealized example data 
listed in Table[l]we find f(a, b) = a, implying that the genetic constitution of 
A is equal to the hypothetical pool of migrants, whereas the genetic differen- 
tiation between the pool and population B is non-zero (G s t(b, f(a, b)) = 0.19 
and D(b, f(a, b)) = 0.55). From this result we can argue that there is very 
low potential for gene flow from B to A whereas the potential of gene flow 
from A to B is much higher, which is consistent with the thought example. 

In the general case of a number of P populations, we can define aPx?- 
matrix B, containing of directional measures for genetic differentiation, as 



( b n (A,E(;-)) ••• b 1P (A, £( v )) \ 
B{A,E(. r ))= : ... : 

\b P1 (A,E(;-)) ■■■ b PP (A,E(;-)) J 



(8) 



where the individual elements are defined as 

b lJ (A,E(;-)) = E(a i ,f(a i ,a j )) Vi,j . (9) 

Here, E(-,-) is a placeholder for estimates of genetic differentiation, such 
as G st and D. The value of a particular matrix element b^ can now be 
understood as an indicator for the potential of migration from population i 
to population j. Low values indicate high potential for migration and high 
values indicate low potential for migration in the direction of interest. 



4. Estimation of migration 

In Wright's Island models F st and also G st can directly be related to 



migration rates (Wright, 1931). In a similar manner Jost (2008) discusses the 
dependence of D on migration and mutation and its potential for estimating 



migration from allelic frequencies (Jost, 2008). If we assume that our data 



of interest is consistent with the assumptions of the finite island model with 
infinite number of allelesnthe expression derived in (Jost, 2008) can be used 
for an estimation of migration rates. 

If we assume that the mutation rates are small (i.e. /i 2 <C fi) Equation 
(22) of Jost's 2008 work can be used to estimate the ratio of migration and 
mutation rate, 



m/fi^{n-l)(l-D)/D . (10) 

Since we generally are interested in cases where the migration rate may vary 



between different pairs of populations, we use Equation ( 10 ) with n = 2 for an 



estimation of migration rates between pairs of two populations. We assume, 
that the mutation rate does not differ between the individual populations. 
Since we at this stage are interested in the relative sizes of migration rates 
only, /i can be eliminated from the equation by defining M = m/fi. Relative 
migration between pairs is then estimated as M « (D — 1)/D. 

If we extend this concept to the directional measures of genetic differenti- 
ation introduced in the previous section, similar expressions can be used for 
the estimation of directional, asymmetrical migration rates. For this purpose, 
we define a P x P-matrix C of relative migration rates as 



/ cn(A) 



C(A) 



C\P 



(A) \ 



\ c P1 (A) ■■■ cpp(A) ) 

where the individual elements are defined as 

1 - D(a,i,f(ai,aj)) 



CiM) 



D(ai,f(ai,aj)) 



Vz,j 



(11) 



(12) 



2 The most important assumptions are: We are dealing with a fixed number of n popula- 
tions of finite and equal size, interacting through migration of globally dispersing migrants 
at a certain rate, m. In addition to migration neutral mutations occur at rate /i and result 
in new alleles. Generally the genetic properties of the model are investigated in equilib- 



The respective matrix elements Cij(A) here exhibit an estimate for the relative 
migration rate M from population i to population j and can be evaluated 
using matrix algebra. 

5. Simulations of genetic data with known migration 

To test our approach we simulated a system with known migration. The 
simulation was done in Python (Version 2.7.3) using SimuPop (Version 1.1.0) 
(Peng and Kimmel, 2005). Allele frequencies were stored in Numpy ar- 



rays (Version 1.6.1). Our source code is available as a git-repository at 
(https://gitorious.org/ddsim). Four populations were simulated for 1500 
non-overlapping generations. Each of the populations consists of 1 000 sex- 
ually hybrid individuals of a diploid species. Four microsatellite loci on one 
chromosome were considered and mutation rate was fixed to fi = 0.0001 per 
allele and generation. The simulation was initialized from allele distribu- 
tions drawn independently from a pool of 256 alleles. The simulation was 
run as two phases. First, the populations were evolved without migration 
for 500 generations. As Figure [T] shows, the number of alleles present in each 
population drop sharply right after populations are initiated. The argument 
for this initial, non-migratory phase was to not let genetic drift caused by 
the artificial population structures of the early generations cloud the results. 
In the second phase, migration was added to the evolutionary simulation, 
keeping all other parameters equal. 




Population 

1 

2 

3 



400 600 800 1000 
Generations 



1200 



1400 



1600 



Figure 1: The number of alleles in the populations over generations 
The iterated simulation steps were as follows: 

Pre-mating: Mutation A Stepwise Mutator (StepwiseMutator) was 
invoked to simulate microsatellite data. Mutation rate was set to 0.0001 
and number of alleles to 256. Default settings were used, with exception 
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of specifying mutation rate and max allele index as indicated above. 
With default mutation step size 1, an allele, e.g. number 10 may only 
evolve into allele number 9 or 11. 

Mating: Mating was simulated by random sampling of pairs of parents 
within each population (RandomMating). The two alleles characteriz- 
ing each offspring was selected randomly, each at a probability of 1 — /i 
from the respective genetic material of the parents (at equal chance). 
Default settings were used, except specifying the resulting population 
sizes (1000 individuals). After mating the parental generation was 
removed from the simulation. 

Post-mating: Migration (Phase two) During the second phase, migra- 
tion was invoked via Migrator. Migration rates were determined by a 
migration matrix Mi g, all other parameters default. For each i and j a 
randomly drawn number, based on the probability indicated in Migij, 
of individuals migrate from population i to population j. Individuals 
could only migrate once and due to the migration, population fluctuate 
slightly in size around the intended value of 1 000. 

The migration matrix Mig used for simulation was: 



Mig 



( 0.0002 0.0001 \ 

0.0002 



\ 0/ 



(13) 



Mig represents a migration pattern between the four populations as 
shown in Figure [2j 




Figure 2: The migration pattern as in the matrix Mig. 
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Results from our estimation procedure 

At each generation directional D-values (Dd) was calculated for all loci, 
the harmonic mean of these values over generations is plotted in Figure |3j 
The estimated matrix elements show strong fluctuations between generations. 
These stochastic fluctuations as plotted in the figure demonstrates that the 
accuracy of estimates may differ between generations. 
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Figure 3: Multilocus Devalues over generations for the different directions and popula- 
tions. The figure is divided in two for better visualization of the different lines. All lines 
are however present in both figures although half of them are transparent. 

However the pattern in Figure [3] shows that 1— >2, 1—)- 3 and 2— >3 generally 
have lower D^-values (indication higher migration) then the other dispersal 
paths. It is also evident that population 4 quite fast reach high differenti- 
ation values. At the last generation (1500) a migration matrix (Mo d ) was 
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calculated from the D^- values according to Equation ( 12 ). Since only relative 
migration is estimated, the migration matrix for clarity was normalized by 
dividing all elements by the value of the largest matrix element, yielding 



M. 



D d 



■X- 


0.49 


1.00 


0.00 \ 


0.01 


* 


0.99 


0.00 


0.01 


0.01 


* 


0.00 


\ 0.00 


0.00 


0.00 


*y 



(14) 



We see that the major patterns from the migration matrix Mig (13) used 
in the simulation is reproduced, indicating that our approach is able to de- 
tect migrations patterns from allele frequency data. Population 4 does not 
share any alleles with the other three populations and thus shows migration 
values of zero. The values below the diagonal is low as expected from the 
pattern shown in Figure [2j Having Figure [2] in mind it is also expected that 
Mo d i3 show higher migration then Migi% since population 3 can receive al- 
leles originating from population 1 from two directions. However due to the 
stochastic fluctuations the accuracy of the method differs between genera- 
tions and between runs. Since experimentalists typically have access to data 
from only one generation it is desirable to develop techniques for classifying 
the accuracy and stability of the estimated migration values, which will be 
the subject of future studies. 

6. Conclusions 

We present a new simple and straightforward approach to define a direc- 
tional component to measures of genetic differentiation. The approch can 
easily be applied to genetic samples if allele frequencies are known. Direc- 
tional measures of genetic differentiation can be used for the estimation of 
asymmetric migration patterns. 
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